Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels
نویسندگان
چکیده
Functional DNA sub-sequences and genome elements are spatially clustered through the genome just as keywords in literary texts. Therefore, some of the methods for ranking words in texts can also be used to compare different DNA sub-sequences. In analogy with the literary texts, here we claim that the distribution of distances between the successive sub-sequences (words) is q-exponential which is the distribution function in non-extensive statistical mechanics. Thus the q-parameter can be used as a measure of words clustering levels. Here, we analyzed the distribution of distances between consecutive occurrences of 16 possible dinucleotides in human chromosomes to obtain their corresponding q-parameters. We found that CG as a biologically important two-letter word concerning its methylation, has the highest clustering level. This finding shows the predicting ability of the method in biology. We also proposed that chromosome 18 with the largest value of q-parameter for promoters of genes is more sensitive to dietary and lifestyle. We extended our study to compare the genome of some selected organisms and concluded that the clustering level of CGs increases in higher evolutionary organisms compared to lower ones.
منابع مشابه
Measuring the Diameter of Nanofibers Extracted from Polyblend Fibers Using FCM Clustering Method
متن کامل
Transcriptome Analysis of Minimal Residual Disease in Subtypes of Pediatric B Cell Acute Lymphoblastic Leukemia
Acute lymphoblastic leukemia (ALL) is the most common childhood cancer and the leading cause of cancer-related death in children and adolescents. Minimal residual disease (MRD) is a strong, independent prognostic factor. The objective of this study was to identify molecular signatures distinguishing patients with positive MRD from those with negative MRD in different subtypes of ALL, and to ide...
متن کاملبه کارگیری خوشهبندی دوبعدی با روش «زیرماتریسهای با میانگین- درایههای بزرگ» در دادههای بیان ژنی حاصل از ریزآرایههای DNA
Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...
متن کاملClustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information
Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...
متن کاملDistinguishing Functional Constipation from Organic Causes in Children
The diagnosis of functional constipation (FC) is usually straightforward. Almost 95% of childhood constipation is functional in nature. The remaining 5% can be attributed to wide variety of conditions. Many of these etiologies are obvious by history and examination and many have other specific symptoms besides constipation. It is especially important to look for presence of any symptoms or si...
متن کامل